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Abstract 

Background: Legionella pneumophila is an intracellular pathogen of environmental protozoa. When humans inhale 
contaminated aerosols this bacterium may cause a severe pneumonia called Legionnaires' disease. Despite the 
abundance of dozens of Legionella species in aquatic reservoirs, the vast majority of human disease is caused by a 
single serogroup (Sg) of a single species, namely L pneumophila Sgl. To get further insights into genome 
dynamics and evolution of Sgl strains, we sequenced strains Lorraine and HL 0604 1035 (Sgl) and compared them 
to the available sequences of Sgl strains Paris, Lens, Corby and Philadelphia, resulting in a comprehensive 
multigenome analysis. 

Results: We show that L. pneumophila Sgl has a highly conserved and syntenic core genome that comprises the 
many eukaryotic like proteins and a conserved repertoire of over 200 Dot/lcm type IV secreted substrates. 
However, recombination events and horizontal gene transfer are frequent. In particular the analyses of the 
distribution of nucleotide polymorphisms suggests that large chromosomal fragments of over 200 kbs are 
exchanged between L pneumophila strains and contribute to the genome dynamics in the natural population. The 
many secretion systems present might be implicated in exchange of these fragments by conjugal transfer. 
Plasmids also play a role in genome diversification and are exchanged among strains and circulate between 
different Legionella species. 

Conclusion: Horizontal gene transfer among bacteria and from eukaryotes to L. pneumophila as well as 
recombination between strains allows different clones to evolve into predominant disease clones and others to 
replace them subsequently within relatively short periods of time. 



Background 

Legionella pneumophila is the etiologic agent of Legion- 
naires' disease, an atypical pneumonia, which is often 
fatal if not treated promptly. However, it is principally 
an environmental bacterium that inhabits fresh water 
reservoirs worldwide where it parasitizes within free-liv- 
ing protozoa but also survives in biofilms [1-3]. Since L. 
pneumophila does not spread from person-to-person, 
humans have been inconsequential for the evolution of 
this pathogen. Instead, the virulence strategies of L. 
pneumophila have been shaped by selective pressures in 
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aquatic ecosystems. Indeed, the co-evolution of L. pneu- 
mophila with fresh-water amoebae is reflected in its 
genome sequence. The analysis of two L. pneumophila 
genomes identified the presence of an unexpected high 
number and variety of eukaryotic-like proteins and pro- 
teins containing motifs mainly found in eukaryotes [4]. 
These proteins were predicted to interfere in different 
steps of the infectious cycle by mimicking functions of 
eukaryotic proteins [4]. For several of these eukaryotic 
like proteins it has been shown recently that they are 
secreted effectors that help L. pneumophila to subvert 
host functions to allow intracellular replication [5,6]. 
The possibility that L. pneumophila has acquired at least 
some of these genes through horizontal gene transfer 
from eukaryotes has been suggested by two studies [7,8]. 



© 201 1 Gomez-Valero et al; licensee BioMed Central ttd. This is an Open Access article distributed under the terrris of the Creative 
BiolVlGCl CGntrBl commons Attribution License [http://creativecommons.Org/licenses/by/2.0), which permits unrestricted use, distribution, and 
reproduction in any medium, provided the original work is properly cited. 



Gomez-Valero et al. BMC Genomics 201 1, 12:536 
http://www.biomedcentral.eom/1 471 -2 1 64/1 2/536 



Page 2 of 24 



Plasticity is another specific feature of the L. pneumo- 
phila genomes as integrative plasmids, putative conjuga- 
tion elements and genomic islands were identified. In 
addition to DNA interchange between different bacterial 
genera and even domains of life, horizontal gene trans- 
fer within the genus Legionella and within the species L. 
pneumophila has been reported. For example a 65-kb 
pathogenicity island described first in L. pneumophila 
strain Philadelphia [9] is present in several L. pneumo- 
phila strains and also in other Legionella species like L. 
anisa [10]. Another example is the particular lipopoly- 
saccharide cluster of serogroup 1 strains that has been 
detected in L. pneumophila strains of different lineages 
and genetic backgrounds [10]. L. pneumophila has all 
necessary features for incorporating foreign DNA, as 
these bacteria are naturally competent and possess an 
intact recombination machinery [11,12]. These findings 
suggest that the L. pneumophila genomes are very 
dynamic and one would expect that horizontal gene 
transfer and recombination events play an important 
role in their evolution. 

However, different analyses like early studies applying 
multilocus enzyme electrophoreses (MEE) supported a 
clonal population structure of L. pneumophila [13]. Two 
recent reports using genetic profiling based on six or 
three genetic loci, respectively concluded also that L. 
pneumophila shows a clonal populations structure 
[14,15] although the presence of few recombination 
events was not ruled out. Later the analysis of the dotA, 
mip and rpoB genes in different isolates suggested for 
the first time that recombination may play some role in 
L. pneumophila evolution [16-18] and a more in depth 
analysis using over 20 loci suggested that recombination 
events might be more frequent than was previously 
thought [19]. However, comparisons of these studies are 
difficult due to different sampling and different analysis 
methods used. Furthermore there may be a bias asso- 
ciated with some of the genes selected in these studies 
like intergenic spacer regions or genes under positive 
selection that may lead to artefactual effects in detecting 
recombination. To solve these problem efforts have 
been undertaken recently to homogenize the results 
obtained for different species to allow comparisons [20]. 
These authors report for L. pneumophila a low recombi- 
nation rate like for the obligate pathogens Bordetella 
pertussis or Bartonella henselae. In contrast Coscolla 
and colleagues suggest a more important role for 
recombination at the intergenic level [21]. 

These different results and the fact that a globally dis- 
tributed L. pneumophila clone implicated in Legion- 
naires' disease has been described [10] may suggest that 
the role of recombination is not relevant. However, the 
description of clonal complexes is not incompatible with 
high recombination rates. Transient clones may appear 



within a recombining population [22], in particular if 
clones with high disease prevalence appear, as this 
seems to be the case for some L. pneumophila strains. 
These clones are often vastly over-sampled due to their 
clinical importance and show strong clonality. Thus, this 
may be correct for this subgroup, but it may not be 
representative for the population. Indeed when analyzing 
over 200 clinical and environmental L. pneumophila 
strains, significantly less diversity was found among the 
clinical isolates [23]. 

In this study we investigated the genome dynamics 
and evolution of the species L. pneumophila by analyz- 
ing horizontal gene transfer, mobile genetic elements 
and recombination on a genome-wide level. We under- 
took this analysis based on six complete genome 
sequences four of which are the previously published 
reference genomes of L. pneumophila Paris, Lens [4], 
Corby [24] and Philadelphia [25] and two that were 
sequenced in this study. The newly sequenced strains 
were selected according to epidemiological features that 
might be reflected in their genomes and should thus 
allow to study genome dynamics with respect to viru- 
lence. Strain Lorraine is rarely isolated from the envir- 
onment but its prevalence in human disease is 
increasing considerably in the last years [26]. In con- 
trast, L. pneumophila strain HL 0604 1035 has been fre- 
quently isolated from a hospital water system since over 
10 years but has never caused disease. Analysis of these 
six strains identified a highly conserved and syntenic 
core genome and a diverse accessory genome. Further- 
more, it showed that recombination events and horizon- 
tal gene transfer are frequent in L. pneumophila. 
Horizontal gene transfer from eukaryotes as well as 
recombination between strains were identified suggest- 
ing that L. pneumophila genomes are highly dynamic, a 
feature allowing different clones to evolve into predomi- 
nant disease clones and others to replace them subse- 
quently within relatively short periods of time. 

Results and discussion 

The L pneumophila core genome comprises over 2400 
conserved genes that are highly syntenic 

To get comprehensive insight into the genetic basis, 
evolution and genome dynamics of L. pneumophila Sgl, 
the strains responsible for over 90% of disease world- 
wide, we analyzed six completely sequenced genomes. 
The strains selected are all of Sgl, have endemic and/or 
epidemic character (e.g. Paris, Lorraine or Philadelphia) 
were isolated in different countries (France, England, 
Spain, US) and in different years. Two strains were 
newly sequenced for this study (Lorraine and HL 0604 
1035), the other four L. pneumophila genomes (Paris, 
Lens, Philadelphia, Corby) have been published pre- 
viously [4,24,25]. The genomes of L. pneumophila 
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Table 1 General features of the 6 L. pneumophila strains analyzed 



L. pneumophila strains 


Philadelphia 


Paris 


Lens 


Corby 


HL06041035 


Lorraine 


Chromosome size (bp) 


3397754 


3503610 


3345687 


3576469 


3492535 


3467254 


G+C content (%) 


38.27 


38.37 


38.42 


38.48 


38.35 


38.36 


N° of genes 


3031 


3123 


2980 


3237 


3132 


3117 


N° of protein coding genes 


2999 


3078 


2921 


3193 


3079 


3080 


Pseudogenes 


55 


71 


84 


59 


73 


48 


tRNA 


43 


43 


43 


44 


43 


44 


16S/23S/5S 


3/3/3 


3/3/3 


3/3/3 


3/3/3 


3/3/3 


3/3/3 


Average length CDS (nts) 


1082.47 


1 000.85 


1008.76 


984.35 


99547 


988.54 


Average length ig (nts) 


147.72 


154 


152.36 


149.24 


155.12 


155.28 


Coding density (%) 


88.22 


86.93 


87.07 


87.25 


86.94 


87.26 


Plasmids 


0 


1 


1 


0 


0 


1 



bp, base pairs; nts, nucleotides; CDS, coding sequence; ig, intergenic region 



Lorraine and HL 0604 1035 consist each of a single cir- 
cular chromosome of 3.4 Mb. Strain Lorraine also con- 
tains a plasmid. As shown in Table 1, the main features 
of the six L. pneumophila genomes analyzed {e.g. gen- 
ome size, GC content and coding density), are highly 
conserved. The core genome of the six L. pneumophila 
genomes comprises 2434 genes, which represents about 
80% of the predicted genes in each genome. Further- 
more, the gene order is highly conserved as the 260 kb 
inversion in strain Lens with respect to the other strains 
is the only exception. When comparing the strains two 
by two, in average 90% of the genes are present in both 
strains (Figure 1). However, when determining the non- 
orthologous genes specific of each genome and not pre- 
sent in the remaining 5 strains, each strain contains 
between 136 (strain HL 0604 1035) and 222 (strain 
Corby) strain specific genes mainly encoded on mobile 
genetic elements. Taken together, the L. pneumophila 
genomes have a highly conserved and syntenic backbone 
and a highly dynamic accessory genome of about 300 
genes each mainly formed by mobile genetic elements, 
genomic islands and genes of unknown function. The 
complete annotation of these six genomes is available in 
a new data base resource that we have set up, Legionel- 
laScope https://www.genoscope.cns.fr/ age/ microscope/ 
about/collabprojects.php?P_id = 27 and at the Institut 
Pasteur, LegioList http://genolist.pasteur.fr/LegioList/. 

The species L pneumophila has a highly conserved core 
genome 

a) Most eukaryotic like proteins are conserved in all L. 
pneumophila genomes 

The presence of proteins with high similarity to eukar- 
yotic proteins or proteins with domains preferentially or 
only present in eukaryotic genomes are a particular fea- 
ture of L. pneumophila [4]. However, the criteria for 
identifying these proteins were never clearly defined. To 
analyze their evolution and possible origin in depth we 



have thus developed an automatic and systematic 
method to identify eukaryotic like proteins according to 
defined criteria. Previously we had identified eukaryotic 
like proteins in L. pneumophila as proteins with the 




Figure 1 Shared and specific gene content of 6 L. pneumophila 
genomes. Each petal represents a genome with an associated 
color. The number in the center of the diagram represents the 
orthologous genes shared by all the genomes. The number inside 
of each individual petal corresponds to the specific genes of each 
genome with non-orthologous genes in any of the other genomes. 
The small circles inside of each petal represent the percentage of 
shared genes (total number divided by the number of genes in the 
smallest genome) between the genome of this petal and the 
genome represented by the color of the small circle. Yellow circle 
inside orange petal means that there are 88% of genes shared 
among Corby and Lorraine. 
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highest similarity score to eukaryotic proteins according 
to BLAST results, or by identifying eukaryotic domains 
[4]. However, due to constantly growing databases 
BLAST results are changing. Furthermore, recent ana- 
lyses of amoeba-associated bacteria, in particular sym- 
bionts of amoeba have shown that they also contain 
eukaryotic like proteins, suggesting multiple origins of 
these proteins in prokaryotes [27]. To get a more com- 
plete picture of eukaryotic like proteins of L. pneumo- 
phila and also to include those proteins that might have 
been transferred independently to different amoeba 
associated bacteria we defined a eukaryotic like protein 
as i) a protein having a better normalized blast score 
against eukaryotic sequences than against prokaryotic 
ones and ii) a protein that did not show BLAST results 
against neither Legionella spp. nor other bacterial spe- 
cies for which resistance to amoeba infection has been 
demonstrated (see material and methods). Applying 
these criteria we identified 46 proteins with putative 
eukaryotic origin, of which 17 are described here for the 
first time (Table 2). Given the fact that these proteins 
were probably acquired by HGT one would expect high 
diversity in the repertoire. However, our analyses 
revealed a considerable conservation as more than 50% 
(26) are conserved in all six L. pneumophila strains, 
indicating an ancient transfer. Furthermore, they show 
89-99% nucleotide identity, probably due to high selec- 
tion pressure for their maintenance. Thus most of these 
proteins belong to the core genome, indicating that 
their acquisition has taken place before the speciation of 
L. pneumophila. These 26 proteins might have allowed a 
common Legionella ancestor to colonize an intracellular 
niche or to adapt better to the intercellular environment 
of a specific protozoan species leading to the evolution 
of the species L. pneumophila. Interestingly, 19 of these 
26 proteins are also conserved in L. longbeachae, which 
might thus be those indispensible for intracellular repli- 
cation of Legionellae (Table 2) [28]. 
b) Eukaryotic protein motifs are highly conserved among 
the L. pneumophila genomes 

A second class of eukaryotic proteins of L. pneumophila 
is carrying domains predominantly present in eukaryotic 
proteins. To systematically identify these proteins we 
used the Interpro database comprising 10 different 
domain search programs [29]. This allowed to identify 
the L. pneumophila proteins carrying eukaryotic 
domains in the newly sequenced strains Lorraine and 
HL 0604 1035 as well as to identify previously not 
reported motifs. Similarly to the above described eukar- 
yotic like proteins over half of the eukaryotic domain 
coding proteins are conserved in all six genomes and 
over 80% are conserved when two genomes are com- 
pared (e.g. 33 of the 39 proteins containing an eukaryo- 
tic motif in strain Lens are present also in strain Paris). 



Moreover half of them share very high nucleotide iden- 
tity of in average 98%-100% (Table 3) again suggesting 
high selection pressure to maintain them. 

Our approach identified also new eukaryotic domains 
like spectrin repeats. The spectrin repeat forms a three- 
helix bundle and was reported primarily in the animal 
kingdom [30]. These repeats act as modules building 
long, extended molecules that also serve as a docking 
surface for cytoskeletal and signal transduction proteins. 
In L. pneumophila it is present in up to eight proteins 
of each strain (Table 3) and all spectrin repeat proteins 
are predicted to be secreted Dot/Icm substrates [31-33]. 
Another interesting domain is the RAS GEF domain 
that is present in two proteins encoded by strain Paris 
one of which (Lpp0350) is conserved in the six strains 
analyzed. Ras-GEFs are small GTPases typically present 
in eukaryotes that are involved in numerous cellular 
processes like gene expression, cytoskeleton re-organiza- 
tion, microtubule organization and vesicular and nuclear 
transport [34]. GEFs (GDP-GTP exchange factors) regu- 
late Rabs, GTP-binding proteins with conserved func- 
tions in membrane trafficking [35]. Interestingly, 
according to the Pfam database Ras-GEF domains in 
bacteria are only present in Legionella, Parachlamydia 
acanthamoebae and Protochlamydia amoebophila, all of 
which are amoeba-associated bacteria. 

Coiled-coil domains have been identified previously in 
the L. pneumophila genomes as this motif can be found 
in all kingdoms of life. However extended coiled-coil 
domains are largely absent from bacterial genomes but 
are typical for archaea and eukaryotes. We thus 
searched the L. pneumophila genomes and 29 other 
genomes of bacterial pathogens or bacteria present in 
the aquatic environment (Table 4) for proteins with five 
or more coiled coil domains. Interestingly, Legionella 
spp. Streptococcus pneumoniae and Pseudomonas aerugi- 
nosa contain the highest percentage of proteins with 
extended coiled-coil domains (6-11 domains) compared 
to the number of predicted proteins encoded in their 
genome and only P. aeruginosa and L. pneumophila 
encode proteins containing more than 10 coiled-coil 
domains (Table 4). Most of these Legionella proteins are 
predicted substrates of the Dot/Icm secretion system 
[31-33,36]. This suggests that large coiled-coil domains 
are specific adaptations to the eukaryotic cell probably 
implicated in interactions with host proteins. 
c) High selection pressure acts on the Dot/Icm T4SS and its 
substrates 

Central to the pathogenesis of L. pneumophila are the 
dot/icm loci, which together direct assembly of a type 
IV secretion apparatus [37,38]. Although all L. pneumo- 
phila strains investigated to date contain the complete 
dot/icm loci, sequence variations among the dot/icm 
genes among different L. pneumophila strains have been 



Table 2 Orthologous eukaryotic like proteins present in the 6 L. pneumophila strains and in L. longbeachae 
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98.95 


Ipc3266 


95.99 


lpo3279 


99.16 


lpv3334 


99.26 


lpg295l 


98.52 


Ilo0076 




(ji 
0 


Putative methyltransferases § 




Ipp3025 


98.50 


Ipl2883 


97.06 


Ipc3269 


99.30 


lpo3282 


97.62 


lpv3338 


97.54 


lpg2954 


97.76 


II0OO74 







Table 2 Orthologous eukaryotic like proteins present in the 6 L. pneumophila strains and in L. longbeachae (Continued) 



Flavanone 3-dioxygenase § 
Protein of unknown function § 

Conserved protein of unl<nown function witli SNARE domain § 

(S)-2-hydroxy-acid oxidase § 

Protein of unknown function § 

Putative Pyridine nucleotide-disulpliide oxidoreductase 

Regulator of chromosome condensation, rcc 

Putative metallopliosphoesterase § 

Serine carboxypeptidase 



Ipol380 
lpol577 

Ipc2n0 98.97 lpo2553 97.25 !pv2681 98.97 
Ipo2960 

lpo3145 100,00 !pv3199 94.82 
Ipl2845 95.59 Ipc3225 97.70 lpo3239 98.47 !pv3288 97.80 !pg2917 

Ipv2481 79.24 Ipg2224* 
Ipv2663 

Ipv3278 97.64 !pg291 1 



98.28 
99.83 



100 



■ Substrates of the Dot/lcm secretion system; § eukaryotic like proteins newly identified in this study; numbers, % nucleotide identity to strain Philadelphia; L.lo, Legionella longbeachae 
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Table 3 Orthologous proteins with eulcaryotic motifs present in the 6 L. pneumophila strains and in L. longbeachae 

Motif L. pneumophila strains L lo 





Paris 




Lens 




rniiaaeipnid 




Lorraine 




nLUDU4 1 Uj3 




Corby 






ANK 


!pp0037 


96.30 


Ipl0038 


97.40 


Ipg0038* 


97.04 


Ipo0042 


97.89 


Ipv0043 


93.66 


Ipc0039 


97.10 




ANK 


!pp0126 


98.94 


IplOl 1 1 


98.48 


IpgOl 12 


94.83 


IpoOl 19 


98.79 


Ipv0127 


93.03 


Ipc013l 


92.16 


II0I394 


ANK 


Ipp0202 


























ANK 


Ipp0356 


























ANK 


Ipp0469 


98.94 


Ipl0445 


96.35 


Ipg0403 


95.53 


Ipo0463 


97.64 


Ipv050 1 


98.48 


Ipc294 1 


98.67 




ANK 


Ipp0503 


98.37 


Ipl0479 


93.86 


Ipg0436* 


93.31 


IpoOSO 1 


98.12 


Ipv0537 


98.37 


Ipc2906 


98.37 




ANK 


Ipp0547 


99.50 


Ipl0523 


96.31 


Ipg0483* 


96.82 


Ipo055 1 


99.83 


Ipv0585 


98.16 


Ipc286 1 


99.16 


Ilo2705 


ANK 


Ipp0750 


100.00 


Ipl0732 


97.65 


Ipg0695* 


100.00 


Ipo0775 


99.84 


Ipv08 1 7 


100.00 


Ipc2599 


98.44 




ANK 


Ippl WO 


























A \ 1 1/ ( c 1 — r 

ANK + St 1 


Ippl 683 


97.68 


Ipll682 


95.32 


Ipgl718* 


98.41 


lpol757 


97.86 


Ipv 1 985 


96.91 


Ipcl 152 


97.25 




ANK 


Ipp 1 905 


























ANK 


Ipp2058 


99.20 


Ipl2048 


90.42 






lpo218l 


98.05 






Ipcl566 


98.80 




ANK 


Ipp206 1 


99.60 


Ipl2051 


95.95 






lpo2185 


95.38 






Ipcl569 


96.80 




ANK 


Ipp2065 


99.93 


Ipl2055 


98.56 






lpo2189 


98.62 






Ipcl573 


98.03 




ANK + Fbox 


Ipp2082 


97.40 


Ipl2072 


98.26 


Ipg2144* 


98.84 


Ipo2207 


99.03 


Ipv2392 


93.99 


Ipcl 593 


99.22 




ANK 


Ipp2166 


99.25 


Ipl2140 


99.1 2 


Ipg22l5* 


99.06 


lpo2285 


97.74 


Ipv2469 


99.1 8 


Ipcl 680 


98.93 




ANK 


Ipp2248 


99.50 


Ipl22 1 9 


99.14 


Ipg2300* 


99.14 


lpo237l 


99.43 


Ipv2567 


98.93 


Ipc 1 765 


99.21 


Ilo0584 


ANK 


Ipp2270 


99.64 


Ipl2242 


97.97 


Ipg2322* 


98.34 


lpo2399 


98.08 


Ipv259l 


99.53 


Ipcl 789 


99.53 


II0O57O 


ANK 


Ippzj 1 / 


99.60 


Ipl2370 


97.94 


Ipgz452 


98.1 9 


lpo2o42 


98.95 


1^, 77n 

lpvz77o 


99.46 


Ipc2026 


99.46 






ippzjzz 


yo./D 


/n/^ 5 7T 

ipizj/ J 


0*^ on 
yo.yu 


ipgzHjo 


Qf; 7^ 


Inn 7 

ipozo^/ 


QQ / Q 


ln\/17Q 7 
ipVZ/o 1 


QQ 7f^ 


ipczuzu 


Q1 17 

y 1 .z/ 


llnCllt^^ 




pippUUyo 


nn 










InnnUDA^ 

tpupuu^j 


Qf^ nn 
yo.uu 


















ipi 1 00 t 


1 nn nn 
uu.uu 














Inr 7 7 T 7 
ipc 1 1 J 1 


Q7 QQ 

y/ .yo 




ANK 






ipizujo 


86 1 7 






InnllQl 
ipuz 1 yj 


95 37 


Ipv2375 


94 96 








ANK 






lpl2339 


98.64 


Ipg2416* 


91.21 


Ipo260 1 


99.00 


Ipv2736 


99.28 


Ipc2057 


98.98 




ANK 










Ipg0402* 


100.00 






IpvOSOO 


96.01 








ANK 


















Ipv2258 










ANK 






Ipll681 


100.00 














Ipcl 151 


97.98 




ANK 






Ipl2344 


100.00 






Ipo2607 


97.93 












F-Box 


Ipp0233 


98.58 


Ipl0234 


93.97 


IpgOUl* 


96.81 


Ipo0202 


97.87 


Ipv0254 


98.94 








F-Box 


Ipp2486 


























F-Box 










Ipg2224* 


99.83 






Ipv2482 


79.24 








F-Box 


















Ipv2481 










RAS GEF 


!pp0350 


94.53 


Ipl0328 


96.32 


Ipg0276* 


97.33 


Ipo0327 


97.64 


Ipv0368 


97.64 


Ipc0353 




II0O327 


RAS GEF§ 


















Ipv2258 










Sec7 


Ippl 932 


98.41 


Ipll9l9 


97.40 


Ipgl950* 


92.16 


Ipo2033 


98.32 


Ipv2243 


98.58 


Ipcl 423 


97.57 


II0I397 
























ipLU 1 OJ 












ipi lUjy 


1 nn nn 
1 uu.uu 


1 nA 1 
ipy 1 uoz 


yy.D 






ipv 1 Zuy 


1 nn nn 
uu.uu 


Innll 7 1 

ipczz 1 Z 


QQ AT 

yy.o 1 




sei- 1 3 


ippUyj/ 




iplUyZ/ 


OQ ^7 

yo.o/ 


If^nllQOA 

ipguoyo 


yo.yj 


lnnn07Q 

lpOOy/8 


QQ (^7 

yo.o/ 


Ipv IUjO 


OQ ^7 

yo.o/ 


Inn1107 

ipCzjy/ 


QQ y1 7 

yy.4/ 


11008^4 


Qol 1 

jcr 1 


ipp 1 1 /4 


QQ 

yy.jy 


ipi ! 1 OU 


QQ ^n 
yo.ju 


if^n 1 7 71 

ipy 1 1 / z 


OQ 3T 


Inn 1 1 Q7 

ipu 1 1 0/ 


QQ 1 1 

yy. 1 1 


lni/1 117 

ipv 1 JZ/ 


QQ 30 

yy.jy 


ipcuojo 


QQ f^^ 

yy.uj 




DUl 1 


ipp 1 J lU 


Q7 P7 

y/ .0/ 


Inll ^07 
ipi I JV/ 


QQ Af] 

yo.^u 


Inn 7 

\pg t J JO 


QQ f^l 

yo.o/ 


Inn 11 
ipu 1 J^J 


QQ n? 
yy.uz 


ipv 1 ^Oy 


Q7 Q7 

y / .0/ 


ipcu/ /U 


QQ 

yo./D 


llnlAAl 
IIU 1 


C^l 1 

sel- 1 


Ippz 1 /4 


99.64 


Ipl2147 


98.48 


Ipg2222^ 


99.56 


Ip0zz9z 


99.47 


In' il/ 77 
!pVz4// 


99.47 


In^ 1 (^on 
Ipc 1 689 


96.27 




Sel-1 


Ipp2692 


99.25 


Ipl2564 


98.61 


Ipg2639 


98.39 


lpo29l7 


99.28 


Ipv2979 


99.39 


Ipc0501 


98.75 


llo2649 


Sel-1 














lpo3233 














Spectrin 


Ippl848§ 


99.18 


Ipi 1845 


98.77 


Ipgl884* 


99.01 


lpol944§ 


98.93 


Ipv2158§ 


99.18 


Ipcl 331 


99.18 




Spectrin 


Ipp2246 


99.29 


Ipl2217 


98.75 


Ipg2298* 


99.29 


lpo2369 


99.29 


Ipv2565 


98.27 


Ipcl 763 


98.75 


II0I707 


Spectrin 


Ippl 930 


95.11 






Ipgl947* 


96.65 


Ipo2029 


97.72 












Spectrin 


Ippl 309 


100.00 






Ipgl355* 


90.59 






Ipv 1468 


100.00 








Spectrin ^ 


Ippl 002 


98.01 


Ipl097l 


91.62 


Ipg0940* 


97.92 


Ipol024 


98.05 


Ipvl077 


97.87 


Ipc2349 


97.15 




Spectrin § 


Ipp0471 


97.79 


Ipl0447 


9745 


Ipg0405* 


98.30 


Ipo0465§ 


98.28 


Ipv0504 


9828 


Ipc2939 


97.70 


llo2845 


Spectrin § 


Ippl 843 


9545 


Ipi 1840 


97.57 










Ipv2l51 


1 00.00 


Ipcl323§ 


99.60 
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Table 3 Orthologous proteins with eukaryotic motifs present in the 6 L. pneumophila strains and in L. longbeachae 

(Continued) 



Spectrin § 


Ippll73 


98.55 


Ipll179§ 


98.80 


lpgU71*§ 


98.56 


1 poll 86 


99.28 


!pvl326 


9856 


lpc0637 


97.84 


llo3 1 14 


STPK 


!pp0267 


96.95 


Ipl0262 


98.72 


Ipg0208 


93.26 


Ipo0242 


98.92 


!pv0288 


95.13 


lpc0283 


97.26 




STPK 


Ippl439 


99.12 


Ipll545 


98.11 


Ipgl483* 


97.67 


Ipo 1483 


98.93 


!pvl609 


99.12 


lpc0898 


97.61 


llol682 


STPK 


!pp2626 


94.88 


Ipl2481 


98.85 


Ipg2556* 


99.13 


lpo2765 


98.70 


!pv2900 


99.13 


lpcl906 


95.31 


llo2218 


U-box 


Ipp2887 


99.72 






Ipg2830* 


97.15 


lpo3 124 


99.58 


!pv3185 


98.75 









*Substrates of the Dot/lcm secretion system according to previous publications; ^ orthologs proteins where the corresponding motif was not present in the other 
genome; ^ eul<aryotic lil<e proteins newly identified in this study; numbers, nucleotide identity with respect to the L. pneumophila Philadelphia gene; L.lo, 
Legionella longbeachae 



reported [39]. The dot/icm loci of the six strains ana- 
lyzed here exhibited a very high nucleotide conservation 
of 98-100% among orthologs except for dotA, icmX and 
for icmC of strain Corby that is shorter and more diver- 
gent (84% nucleotide identity) as compared to icmC of 
strain Paris. These results indicate that strong negative 
selection acts on these genes (Table 5). 

Since the identification of RalF [40], numerous 
approaches have been used to identify Dot/lcm translo- 
cated substrates. Currently 278 proteins of L. pneumo- 
phila have been described as being transloctaed by the 
Dot/lcm T4SS system [7,31,32,41-44]. Analysis of their 
distribution among the six L. pneumophila strains 
reveals a very high conservation, as 206 of the 278 sub- 
strates are present in all six strains. Nearly all of them 
show a nucleotide similarity of 95-100% and only nine 
are specific to strain Philadelphia (Additional file 1, 
Table SI). Furthermore, only 34 of the 278 substrates of 
strain Philadelphia are missing in strain Paris, 30 in 
strain Lorraine or 25 in strain HL 0604 1035 (Additional 
file 1, Table SI). Thus, although high redundancy seems 
to be present in the repertoire of Dot/lcm effectors, the 
strong conservation of nearly all of them in all genomes, 
argues for their mutual importance for the L. pneumo- 
phila life cycle, 

Rare exceptions are RalF and AnkB/Lpp2028. The 
nucleotide sequence of ralF of strain Philadelphia is 
only 85% similar to the ralF genes of the other strains 
and is 72 nts (24aa) shorter. A similar situation is seen 
for lpg2144/ankB that is 54 nts (18aa) longer in strain 
Philadelphia and Lens than in strain Paris and Corby. 
This is surprising, as the C-terminal region of AnkB of 
strain Philadelphia contains a eukaryotic prenylation 
CAAX motif mediating posttranslational modification of 
effector proteins, important for intracellular replication 
of L. pneumophila. Lipidation facilitates the localization 
of this effector protein to host organelles and serves as a 
docking platform for ubiquitinated proteins [45,46]. 
Thus in strain Paris and Corby other proteins might 
take over this function. Taken together, this analysis 
suggests that over 200 of the Dot/lcm substrates of L. 
pneumophila have been present or have been acquired 
before the speciation and that such a large repertoire of 



effectors is indeed necessary for intracellular replication 
and adaptation to the specific protozoan hosts. 

The species L pneumophila has a highly dynamic 
accessory genome 

a) A wide variety of T4ASSs and conjugative elements 
contribute to genome plasticity 

Based on sequence comparisons, T4SSs are categorized 
according to their similarity to the A. tumefaciens VirB/ 
D4 system into type IVA (type F and P) and type IVB 
secretion systems [47]. T4ASSs resemble the VirB/D4 
system of A. tumefaciens, whereas T4BSS proteins are 
more distantly related to the VirB/D4 proteins [48]. 
T4SSs are involved in effector translocation, horizontal 
DNA transfer to other bacteria and eukaryotic cells, in 
DNA uptake from or release into the extracellular 
milieu or in the spread of conjugative plasmids [49]. 
Genome sequence analyses suggest that for L. pneumo- 
phila T4SSs play an important role for adaptation and 
virulence as each genome encodes several T4ASSs in 
addition to the essential T4BSS Dot/lcm discussed 
above. We identified in each strain either F-type or P- 
type T4ASSs or both. Figures 2 and Figure 3 show the 
organization of the structural genes encoding these sys- 
tems, their organization and their localization (chromo- 
somal or plasmid). The F-type T4ASSs are all predicted 
to encode a complete T4SS core as well as the essential 
gene products for pilus assembly and mating pair stabili- 
zation that appears to be involved in DNA transfer. 
They show homology and colinearity with the ira-region 
of the E. coli F plasmid [50] and with the recently 
described tra region of Rickettsia belii [51]. In L. pneu- 
mophila strain Philadelphia (Tra5) and L. longbeachae 
strain NSW (Tra6), where the system has a chromoso- 
mal localization, it is inserted in a tRNA gene and flank- 
ing repeats are present as well as a gene coding for an 
integrase, suggesting that these T4SSs are mobile (Figure 
2). Furthermore, comparison of amino acid identities 
revealed that the Tra- region on the L. pneumophila 
strain Paris plasmid (Tral) shows much higher identity 
with the Tra region located on the L. longbeachae plas- 
mid (Tra4) than with those of the different L. pneumo- 
phila strains (Paris-Tral, Lens-Tra3 or Lorraine-Tra2) 
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Table 4 Genes coding for proteins with more than 5 colled coil domains/protein in different bacterial genomes 


Organism Coiled coil domains 


Gene 


Product 


Number of Coiled 


proteins 






coil 


6. henselae Houston-1 0 


Ch. pneumoniae J 138 0 


Cti. tractiomatis D UW-3 0 


Cglutamicum ATCC 1 3032 0 


£ CO// 0157:H 7 1 


ECH74II5_2I73 


tail lengtli tape measure protein 


5 


H. influenzae Rd KW20 0 


H. pylori 26695 1 


HP0527 


cag pathogenicity island protein Y 


10 


L pneumoptiila Corby 7 


IpclUO 


substrate of the Dot/lcm system/lcm system 


5 




IpcllSI 


substrate of the Dot/lcm system/lcm system 


6 




Ipcl452 


substrate of the Dot/lcm system/lcm system 


6 




Ipcieil 


hypothetical protein 


12 




1 pel 987 


substrate of the Dot/lcm system, effector 
protein B 


9 




Ipc2349 


substrate of the Dot/lcm system, LidA 


6 




Ipc3079 


substrate of the Dot/lcm system, effector 
protein A 


5 


L. pneumophila 1 0 
HL06041035 


ipv i U/ / 


substrate of the Dot/lcm system, LidA 


0 




/ni/J 77^ 
ipv 1 / ZJ 


SUUSLIdLC Ul LMC UUL/ILIIl SySLCIII 


u 




Ipvl966 


substrate of the Dot/lcm system 


5 




Ipv 1967 


substrate of the Dot/lcm system 


6 




Ipv2269 


substrate of the Dot/lcm system 


7 




Ipv2408 


conserved protein of unknown function 


5 




Ipv28l6 


substrate of the Dot/lcm system, effector 
protein B 


10 




Ipv2959 


chromosome segregation SIVIC protein 


9 




Ipv3l44 


substrate of the Dot/lcm system, effector 
protein A 


5 




Ipv3l84 


substrate of the Dot/lcm system, SidH 


9 


/ /nno/ )D^/^^^M/7/^ 1 one ~7 
L. pi leUl I lUfJ! lllU Lclli / 


ipi 1 ^J/ 

Ipll660 


CI 1 H\c1"i'at"ci r\T t"Ko r~l/^+"/ /"m c\/c1"cim 
SUUSLIdLC Ul LMC UUL/ICIM SySLCIII 

substrate of the Dot/lcm system 


u 

7 




Ipll66l 


substrate of the Dot/lcm system 


6 




Ipll94l 


substrate of the Dot/lcm system 


5 




Ipl2084 


substrate of the Dot/lcm system 


5 




Ipl241l 


substrate of the Dot/lcm system, effector 
protein B 


9 




Ipl2708 


substrate of the Dot/lcm system, effector 
protein A 


5 


L. pneumophila Lorraine 10 


Ipol024 


substrate of the Dot/lcm system, LidA 


6 




Ipol608 


substrate of the Dot/lcm system 


6 




lpol735 


substrate of the Dot/lcm system 


7 




lpol736 


substrate of the Dot/lcm system 


5 




Ipo2060 


substrate of the Dot/lcm system 


6 




lpo22l6 


substrate of the Dot/lcm system, SdeC 


5 




Ipo2680 
lpo2896 


substrate of the Dot/lcm system, effector 
protein B 

chromosome segregation SIMC protein 


9 
9 




Ipo3083 


substrate of the Dot/lcm system, effector 
protein A 


5 




lpo3l23 


substrate of the Dot/lcm system 


9 


L. pneumophila Paris 6 


Ippl002 


substrate of the Dot/lcm system, LidA 


6 




Ippl546 


substrate of the Dot/lcm system 


6 
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Table 4 Genes coding for proteins with more than 5 coiled coil domains/protein in different bacterial genomes 


(Continued) 










ippl666 


substrate ot the Dot/lcm system 


7 




ipp 1 952 


substrate ot the Dot/lcm system 


6 




ipp2555 


substrate or the Dot/lcm system, effector 


10 






\J \ U LCI 1 1 u 






Ipp2883 


substrate of the Dot/lcm system 


6 


L. pneumophila 8 


Ipgl355 


substrate of the Dot/lcm system, SidG protein 


5 


Philadelphia 










Ipgl588 


substrate of the Dot/lcm system 


6 




ipgUOi 


substrate of the Dot/lcm system 


5 




Ipgl702 


1 J- 4- r 4-1 4- /I 4- 

substrate ot the Dot/lcm system 


6 




tpgzlbo 


protein of unknown function 


5 




lnn?4Qn 


inctratp rif tnp Plot/li^m c;\/i;tom (^TTPtrtt^f 

jUUiLI a LC IJ\ LIIC L/UL/ILIII jyiLCI II, d 1 CI^LUI 


g 






protein B 






Ipg2793 


substrate of the Dot/lcm system, effector 


5 






protein A 






Ipg2829 


substrate of the Dot/lcm system 


8 


L monocytogenes EGD-e 3 


Imo0650 


hypothetical protein 


5 




Imo0955 


hypothetical protein 


5 




lmol224 


hypothetical protein 


5 


M. tuberculosis Fl 1 1 


TBFG_I2936 


chromosome partitioning protein Smc 


10 


M. tuberculosis H37Ra 1 


MRA_2947 


putative chromosome segregation Smc 


10 


N. meningitidis MC58 0 


P. aeruginosa LESB58 1 1 


PLES_082il 


putative tail length tape measure protein 


7 




PLESJ253I 


hypothetical protein 


7 




PLES_I254I 


hypothetical protein 


5 




PLES_l358i 


putative tail length tape measure protein 


7 




PLES_l524i 


electron transport complex protein RnfC 


a 




PLES_l587i 


hypothetical protein 


6 




PLE5_3665i 


putative ClpA 






PLES_380ii 


putative chromosome segregation protein 


11 




PLES_4662i 


putative exonuclease 


13 




PLES_5072i 


hypothetical protein 


6 




PLES_5549i 


putative outer membrane protein precursor 


5 


ft fells URRWXCal2 2 


RF_0022 


putative surface cell antigen seal 


7 




RF_0725 


antigenic heat-stable 120 kDa protein 


5 


R.prowazekli Madrid E 0 


R.a typhi Wilmington 0 


S. typliimurlum LT2 5 


STM0395 


exonuclease subunit SbcC 


7 




STM0567 


putative DNA repair ATPase 


7 




STM0994 


chromosome partition protein mukB 


10 




STMi04l 


minor tail protein 


5 




STM3i99 


hypothetical protein 


5 


S. flexnerl 2a 245 7T 1 


S0984 


fused chromosome partitioning protein 


10 


Synechocystis sp. PCC 6803 2 


sill 772 


IVlutS2 protein 


5 




sir! 301 


hypothetical protein 


6 


S. pneumoniae D39 4 


SPD_0I26 


exported protein of unknown function 


6 




SPD_07I0 


putative Septation ring formation regulator EzrA 


7 




SPD_il04 


chromosome partition protein Smc 


10 




SPD_20I7 


exported protein of unknown function 


6 


1/1/. pipientis wMel 0 
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Table 4 Genes coding for proteins with more than 5 coiled coil domains/protein in different bacterial genomes 

(Continued) 



X. fastidiosa 9a5c 


0 








Y. pestis KIM 


4 


y0227 


hypothetical protein 


6 






y0976 


ATP-dependent dsDNA exonuclease 


12 






y2765 


chromosome partition protein Mul<B 


10 






yapB 


autotransporter 


6 



(Figure 2). Thus these systems seem to be transferred 
horizontally via plasmids but are also able to integrate 
in the genome similar to what was reported for the Lvh- 
region [52]. 

The F-type T4SS encode long, flexible pili that allow 
donors to mate in liquid and on solid media with equal 
efficiencies [53]. In contrast P-type T4SS like described 
in P. aeuroginosa encode short and rigid conjugative pili 
that allow surface mating. Homologues to this system 
are also present in the Legionella genomes. They were 
initially described in two genomic islands of L. pneumo- 
phila strain Corby (Figure 3; Trbl and Trb2) [54]. We 
show here that they are also present in the chromo- 
somes of L. pneumophila strain Lorraine (Trb3) and L. 



longbeachae NSW150 (Trb4) (Figure 3). Again for all 
T4SS regions flanking repeats are found suggesting 
mobility, and protein identity values and GC-content 
values of the tra-trb genes are higher than the genomic 
average (38%), supporting again horizontal and not ver- 
tical transmission. 

Another intriguing feature of these regions is that sev- 
eral transposases and phage related proteins are present 
in each of the tra clusters as well as genes coding for 
homologues of a putative phage repressor protein 
(PrpA) and for homologues of LvrA, LvrB and LvrC, 
first described for the Lvh region of L. pneumophila. 
LvrC is a homologue of CsrA, a protein crucial for the 
regulation of the switch between replicative and 



Table 5 Percentage of nucleotide identity of orthologous dot/icm genes with respect to the L. pneumophila 
Philadelphia sequence 



Gene name 


Length (nts) 


Phila 


Paris 


Id 


Lens 


Id 


Lorrain 


Id 


HL06041035 


Id 


Corby 


Id 


L. long 


Id 


icmT 


261 


Ipg0441 


Ipp0507 


99.6 


Ipl0483 


99.1 


lpoOS07 


100 


Ipv054l 


96 


Ipc2902 


99.2 


llo279S 


75.2 


icmS 


345 


Ipg0442 


Ipp0508 


98.5 


Ipl0484 


98.8 


lpoOS08 


99.1 


Ipv0542 


944 


Ipc290l 


98.3 


llo2794 


76.9 


icmR 


363 


Ipg0443 


Ipp0509 


96.9 


lpl048S 


98.3 


lpoOS09 


97.8 


Ipv0543 


97.5 


Ipc2900 


96.9 






IcmQ 


576 


Ipg0444 


IppOSW 


97 


Ipl0486 


99 


IpoOSIO 


98 


lpvOS44 


98 


Ipc2899 


98 


l!o2792 


70.7 


icmP/dotM 


1131 


Ipg0445 


IppOSll 


98 


Ipl0487 


99 


IpoOSII 


98 


lpv054S 


98 


Ipc2898 


99 


l!o279l 


74.5 


icmO/dotL 


2352 


Ipg0446 


Ipp0512 


984 


Ipl0488 


97.7 


lpoOSI2 


98.1 


Ipv0546 


98.3 


Ipc2897 


98.3 


I!o2790 


77.7 


IcmN/DotK 


570 


Ipg0447 


Ipp0513 


99.3 


Ipl0489 


98.6 


lpoOSI3 


98.9 


Ipv0547 


99.6 


Ipc2896 


99.7 


llo2789 


67.3 


icmM/dotJ 


285 


ipg0448 


IppOSH 


97.9 


Ipl0490 


97.9 


IpoOSM 


97.9 


Ipv0548 


99.3 


lpc289S 


98.6 


llo2788 


61.7 


icmUdotI 


639 


ipg0449 


IppOSlS 


99.8 


Ipl049l 


994 


IpoOSIS 


994 


Ipv0549 


99.8 


Ipc2894 


99.5 


llo2787 


78.6 


icmK/dotH 


1083 


Ipg0450 


IppOSW 


94.8 


Ipl0492 


94.3 


IpoOSie 


95.2 


IpvOSSO 


944 


Ipc2893 


94.7 


llo2786 


712 


icmE/dotG 


3147 


Ipg0451 


IppOSU 


93.7 


Ipl0493 


94.0 


IpoOSU 


94 


IpvOSSI 


94 


Ipc2892 


94.3 


llo278S 


69.1 


IcmG/dotF 


810 


Ipg0452 


IppOSW 


98 


Ipl0494 


97 


IpoOSIS 


98 


lpvOSS2 


98 


Ipc289l 


97 


l!o2784 


55.7 


icmC/dotE 


585 


Ipg0453 


lppOS19 


99.6 


lpl049S 


99.1 


lpoOSI9 


99.7 


lpv05S3 


99.3 


Ipc2890 


54 


l!o2783 


69.1 


icmD/DotP 


399 


Ipg0454 


lppOS20 


97 


Ipl0496 


98 


lpoOS20 


97 


lpvOSS4 


98 


Ipc2889 


97 


l!o2782 


77.3 


icmJ/dotN 


627 


Ipg0455 


lppOS2l 


99 


Ipl0497 


98 


lpoOS2l 


99 


IpvOSSS 


99 


Ipc2888 


98 


l!o278l 


794 


IcmB/DotO 


3030 


Ipg0456 


lppOS22 


98.1 


Ipl0498 


98.3 


lpoOS22 


98.3 


IpvOSSe 


98.2 


Ipc2887 


97.6 


Ilo2780 


76.4 


IcmF 


2922 


Ipg0458 


lppOS24 


982 


IplOSOO 


98.5 


lpoOS24 


98.3 


IpvOSSS 


98.5 


lpc288S 


98.2 


llo307S 


69.5 


IcmH/DotU 


786 


Ipg0459 


lppOS25 


994 


IplOSOl 


99.5 


lpoOS2S 


99.7 


lpvOSS9 


99 


Ipc2884 


99 


Ilo3074 


68.8 


dotD 


492 


Ipg2674 


Ipp2728 


98 


Ipl260l 


98 


lpo2953 


98 


Ipv30l8 


98 


Ipc0463 


99 


Ilo0369 


76.5 


dote 


912 


Ipg2675 


Ipp2729 


98.7 


Ipl2602 


98.5 


lpo2954 


98.8 


Ipv30l9 


98.6 


Ipc0462 


99.9 


Ilo0368 


74.8 


dotB 


1134 


Ipg2676 


Ipp2730 


99 


Ipl2603 


98 


lpo295S 


98 


Ipv3020 


98 


Ipc046l 


99 


Ilo0367 


76 


dotA 


3108 


Ipg2686 


Ipp2740 


83.3 


Ipl26l3 


96.8 


lpo2967 


83 


Ipv3032 


83.6 


lpc04S0 


85.8 


Ilo0364 


514 


icmV 


456 


Ipg2687 


Ipp274l 


91 


Ipl26l4 


91 


lpo2968 


91 


Ipv3033 


92 


Ipc0449 


92 


I!o0363 


64.3 


icmW 


456 


Ipg2688 


Ipp2742 


95.1 


lpl26IS 


97.6 


lpo2969 


95.1 


Ipv3034 


954 


Ipc0448 


95.1 


I!o0362 


79.3 


icmX 


1419 


Ipg2689 


Ipp2743 


84.3 


Ipl26l6 


85.2 


Ipo2970 


85.6 


lpv303S 


85.6 


Ipc0447 


841 


I!o036l 


46.9 



Id, identity 
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Tra1 : Paris P 



Tra2: Lorr P 



Tra3: Lens P 



Tra4; L longb P 



[?l|ers/)||lfar|[?]|7|r7HT|l'fa^ || fraL||l>aE||lfaK ||)faB ||ljaV||l>aC|| pil ||lfiiW||lfaO||lftC||lfaW||lfaF||MiB||lraH||jflGl |fcaD|[ira7 



Y 

2291 



42715 9533 nis 



j/yrf |F]Fl|cra/l||)far||?] |lfa/<|| lfal.||(faE||fraK||lfaB||lfay||(faC||p.< UlrawUlraOlllfbcHlraMll lfaF|| lftiB|| IrawH lfaG|p][i7aD ||l>al 



42 60 63 43 68 42 39 59 59 68 41 52 76 51 53 61 59 71 44 37 31 ^ 

114415 

6884 nts 

[?l|ers.4||lrar||lfa/i||lfaL||lfaE||lfaK||lf3B||lfaV||ff3C|| pit ||lfalV||lfaU||lrfiC||traW||lraF||lrt)B||lfaH|lfaG|p][ira'D||lra; I \pin I 



78 76 80 70 73 29 43 27 25 25 27 22 34 51 35 26 29 29 31 25 37 28' 

42715 

6520 nIs 



lytaOll «u |[7||lfa/<||fral.||lfaE||fraK||lfaS|fray||tfaC|| pil ||)faW||lfaa||lf6C||lfaM||lraF|| M)B||lfaH|lfaG| r'^aD||lra/ 



31 95 91 92 96 97 97 91 



96 91 93 



R 2361021-2361081 (61 nts 91%) 



Tra5: Phi C 



93 95 91^ 

36323 tRNALys-Arg 
(2296397.2297941) (2286796-2296872) 

R 2396815-2296875 



\pha \\lvrA ||fafB ||cfs/)||lfa4 || lfaL||lfaE||lfaK||traB||fray||lfaC|| pjf ||tfaW||lfaU||lftC||)faM||lfaF||lftB||lraH||lraG|r?]| lfaDl|lfa/ 



25 38 24 



36 54 27 30 24 32 32 22 



R 2197708-2197760 (53 nls 98%) 



Tra6: L. longb C izi \pha\\lvrA\\lvrBf^\lraL\[lrBE\ltraK\\ln,Bl 

' V 40 27 27 25 



IRNASer-Arg 
(2149172-2150173) (^I^OOe-^MSOaS) 

R 2149032-2149084 



/jaV l,aC pi/ (laW ((aL/ litC KaN IraF MB IraHllfaG ' (faD (fa( 



27 18 33 51 33 25 29 30 32 25 



|_^J^r-t 



9515 nls tRNA-Arg 



E. coli P 



R. been C 



\lraA\\ (fat fraE (raK fraB 



I pil ||lraWl| fra<;|| lrfiC|| lraN]\ lraF]\ trbB\\ IraHl lraG| | lraD\\ IraT] 



27 30 23 23 25 



32 24 34 52 33 27 30 30 34 25 



lraL\\lraE\\lraK l/aS llraV\\lraC 



A^|lf3i;||lr<iC||lfaW|ri faF | 



\traH\li 



23 33 28 34 40 33 31 25 27 20 30 26 

Figure 2 Schematic representation of F-type IV secretion systems (T4SSA) for conjugal DNA transfer of L. pneumophila. In green and 
orange, tra ancJ trb genes respectively. L long, Legionella longbeadnae; P, PlasmitJ; C, Chromosome; ycaO, Protein of unknown function with a 
YcaO like-domain; tfu. Protein of unknown function with a TfuA domain; pil, Pilus assembly protein precursor; t, transposase; f. coli, Escherichia 
coli; ft bee//, Rickettsia beeli; pha, Phage repressor; int, integrase; pin, site-specific DNA recombinase el 4 prophage; R; repeat. Yellow squares 
represent flanking repeats, with length and percentage of identity between repeats in parenthesis. tRNAs, position in the genome in parenthesis. 



R 216192-212160 (33nts 97)% 



Trb2: 



Corby C I ■ — y |f)-aM|| traL|jfrafc ![T]:ffa>< j| fraJ |ff-a/ | p]lffaC||fraD|lffaF|fraGl[ffbt. |[f/tK||fi-fcJ |[ tffe/ | ffW| frdG||f/-bF|[fftiE||frfaD||ffbc| | wfS [|cs.-/^||/i/rB (phajj — | 

7823 nts ' 
56784 (54 nts 100%) 
. tRNAPfo (656736-656812) 

£] Ifra?i7][7rari \trak\ \traJ\\trat 



R 181758-181726 



Trb1 : Corby C 



R 614074-614021 

]F][?]|ff-aC]|iraD|[tfaF||fraG||trfeL |ff-fiK||frbj|[frb/ |frbH|[frbG||f/-bF || frb£l[fffeD|[frbc] | wVB ||csf>l1|;t/r5 ||/^rA] |pfva | 



83 78 72 90 81 74 82 75 67 



Trb3: 



R 2633251-2633305 (55nls 94%) 



84 87 91 85 84 81 81 81 

3767 nls 

tRNAMet (2573021-2573047)A 
R 2573115-2573051 T 



Lorraine C rt Fl fiSH [i:iPl\i^f^f^[i^\kaD\[lraF\\l,aG]\MLp,bKprbjlm 



80 67 59 83 72 74 75 73 66 



81 86 83 81 83 79 82 65 



Trb4: 

L. long C 



R 3416845-3416899 (55 nts 94%) 



R 3335569- 
tRNA Met (333562145-3335545) 3335623 



|fraM||lfaL| \lraK\ 



7||(fa( 1 1?1 |?l|(raC||tfaD||traF||fraG|r 



K^lrbjltibl \\trbH\\trbG\\lrbFllrbEltrbDltrbC^«irB]\csrA\\l\,rB\\hrA \ \pha \ 



77 61 61 84 



74 78 74 69 77 81 87 84 82 



83 79 82 65 



A. tumefaciens P Ti 



P. aeruginosa C 



\lrbL\\lrbK\\lrbjVltrbl 



]\lrbjprbl |frbti|[ 



(rbG IrbF UrbE \\trbD \\lrbC virB \\csrA\\lvrB WrA 



40 27 47 42 40 



|lfaG|[ffbt- |lffbK||frb./l frbf |ff-bH|frbG]| JrbF||(ffcE ||frbD ||ffbc] | wrB ||csr,A||/i-rB ||/>/rA | 
39 22 28 30 26 36 28 39 44 



Figure 3 Schematic representation of P-type IV secretion systems (T4SSA) for conjugal DNA transfer of L pneumophila. In green and 
orange, tra and trb genes respectively. L long, Legionella longbeachae; P, Plasmid; C, Chromosome; ?, Protein of unknown function; A. 
tumefaciens, Agrobacterium tumefaciens; P. aeruginosa, Pseudomonas aeruginosa; t, transposase; pha. Phage repressor; Int Integrase; Pseudogenes 
are in discontinues squares; Yellow squares represent flanking repeats, with length and percentage of identity between repeats in parenthesis. 
tRNAs, their position in the genome is given in parenthesis. 
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transmissive phase of L. pneumophila [55]. It is tempt- 
ing to assume that these CsrA homologues are impli- 
cated in the regulation of the mobility of these islands. 
Possibly, dependent on the growth phase and/or on 
metabolic cues L. pneumophila might excise these 
islands as multiple copies could be advantageous in cer- 
tain conditions, or perhaps allow high frequencies of 
DNA transfer leading to fast and efficient adaptation to 
new conditions. The genomic features of these islands 
suggest a particular mechanism of mobility, which will 
be interesting to investigate. 

b) The L. pneumophila genomes encode systems specific for 
protection against invading DNA and stabilization of large 
genomic fragments 

Bacteria have developed multiple methods of protection 
against mobile genetic elements or bacteriophages. An 
example for acquired phage specific immunity is clus- 
tered regularly interspaced short palindromic repeats 
(CRISPR) loci [56]. Another type of protection may be 
conferred by toxin-antitoxin (TA) systems. Bacterial TA 
systems are small genetic modules composed of a toxin 
and antitoxin. While toxins are always proteins. 



antitoxins are either RNAs (type I and III) or proteins 
(type II) [57]. These systems were first described for 
being dedicated to plasmid maintenance. Several lines of 
research indicate that chromosomal TA systems might 
serve as protection against mobile genetic elements such 
as plasmids and phages. However, recent studies have 
shown that type II systems are also involved in the sta- 
bilization of large genomic fragments and of integrative 
conjugative elements [57]. Interestingly, type II TA sys- 
tems are thought itself to be part of the mobilome and 
to move from one genome to another through horizon- 
tal gene transfer [57]. 

Genome analyses identified several TA and CRISPR 
systems. Interestingly, we identified only type II TA sys- 
tems of which all except two are in a chromosomal 
location (Table 6). However, of the 18 chromosomal 
encoded TA systems identified at least 14 are located on 
putative genomic islands or mobile genetic elements. 
The two most frequently found TA systems in the L. 
pneumophila genomes are homologues of the HigAB 
and RelEB systems. HigAB was first described in the 
Vibrio cholerae superintegron where it encodes mRNA 



Table 6 Genes encoding putative toxin-antitoxin systems in six L pneumophila genomes 



L pneumophila strains 


Toxin-antitoxin 


Paris 


Lens 


Philadelphia 


Corby 


Lorraine 


HL06041035 


higA 
higB 




Ipl2833 (96)* 
Ipl2834 (87)* 


Ipg2914 (96) 
Ipg2915 (103) 






Ipv3285 (96) 
Ipv3286 (103) 


higA 
higB 




lplW92 (93)* 
lplW93 (107)* 










higA 
higB 


Ipp0064 (434)* 
Ipp0065 (79)* 








Ipo0072 (432)* 
Ipo0073 (79)* 




Similar to hipA 


Ipp2427 (78)* 


Ipl2291 (102)* 
lp!2292 (312)* 


Ipg2369 (102) 
Ipg2370 (312) 


!pc2112 (312) 
!pc2113 (37) 
!pc2114 (65) 


lpo2551 (115)* 


Ipv2676 (102)* 
Ipv2677 (310)* 


yhvA 
sohA 










lpoW74 (168)* 
lpoW75 (115)* 




relE 
relB 


plpp0090 (83) 
plpp0089 (95) 


IphSSy (82)* 
!p!1588 (85)* 










relE 
relB 




lplW84 (84)* 




Ipc2177 (93)* 
Ipc2178 (88)* 


IpoOUO (93)* 
Ipo0119 (86)* 





parE 
parD 



Ipe2361 (98)* 
Ipe2360 (84)* 



pemK 



Ipo0114 (106) 



*TA systems located on putative genomic islands; In parenthesis length of the corresponding protein 
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cleaving enzymes and can stabilize plasmids [58]. RelEB 
was shown, when introduced into the E. coli chromo- 
some to prevent deletion of flanking DNA and thus to 
diminish large scale genome reduction [59]. The same 
function was shown for the ParED system of Vibrio vuli- 
nificus, homologues of which are also present in one of 
the L. pneumophila genomes (Table 6). Thus, the differ- 
ent L. pneumophila TA systems might be important for 
stabilization of plasmids and integrative conjugative ele- 
ments and for protection against invasion of plasmids, 
phages, or other mobile genetic elements. 

The CRISPR/cas system was shown to provide resis- 
tance against invading viruses and plasmids and has 
been identified in many bacteria and archea [60]. 
CRISPR/cas loci are also present in the L. pneumophila 
genomes of strains Paris, Lens, Alcoy and 130 b but are 
absent from strains HL06041035 and Lorraine. Accord- 
ing to the cas genes, the CRISPR locus of Paris is closely 
related to that of strain 130 b. In contrast the one of 
strain Lens located on the plasmid is closely related to 
the chromosomal CRISPR locus of strain Alcoy as pre- 
viously described [61]. Strain Lens carries a second 
CRISR locus on the chromosome; however, it does not 
seem to be functional like the one encoded by strain 
Alcoy. Probably strong protection against invading 
phages is not extremely important, as not all L. pneumo- 
phila strains contain CRISPR loci. This may be related 
to their intracellular life style or that despite their wide- 
spread occurrence in aquatic environments only few 
bacteriophages that specifically infect Legionella seem to 
exist [62]. 

c) Accessory genome of strains Lorraine and HL 0604 1035 

In order to get insight in the genetic basis of the two 
newly sequenced strains, possibly implicated in their dif- 
ferent disease frequencies (Lorraine is an newly emer- 
ging endemic clone and strain HL 0604 1035 is a L. 
pneumophila Sgl strain never isolated from disease) we 
analyzed the specific gene content of each of these 
strains more in depth. Strain HL 0604 1035 contains 92 
and strain Lorraine 148 genes without homology to any 
gene of the other five L. pneumophila strains sequenced 
of which the majority (60 in strain HL 0604 1035 and 
73 in strain Lorraine) code for proteins of unknown 
function (Additional file 2, Tables S2 and additional file 
3, Table S3). Among the genes in these two genomes 
that lack an ortholog in the other sequenced L. pneumo- 
phila genomes, about 50% are clustered on three large 
genomic islands. One genomic Island (GI-HLl) of 45 kb 
spans from lpv2637 to lpv2691. It is bordered by a Met 
tRNA gene and encodes a phage related integrase. A 
second putative mobile element (GI-HL2) of 27 kbs 
contains the region from lpv0193 to lpv0226. It is bor- 
dered at one side by an integrase and a reverse tran- 
scriptase (lpv022S) and on the other side by a prophage 



Rac integrase and a phage excisionase. Strain Lorraine 
contains also a large genomic island (GI-Lol) of 69 kb 
that spans from lpo2442 to lpo2531. It is inserted in a 
Met tRNA gene, contains a phage related integrase and 
flanking repeats of 72 nts. Additional, smaller genomic 
islands seem to be present, however, their borders are 
difficult to define. Thus most of the strain specific genes 
seem to be acquired by HGT through mobility of geno- 
mic islands. 

Only for few of the specific genes a putative function 
can be predicted like genes coding for proteins involved 
in sugar and nucleotide metabolism, for uridine dipho- 
sphoglucuronate 5'-epimerase or for an UDP-glucose 6- 
dehydrogenase. Furthermore a specific ANK motif con- 
taining protein and a leucine reach repeat protein are 
present in strain HL 0604 1035. In strain Lorraine we 
identified mainly specific metabolic enzymes like a puta- 
tive flavanone 3-dioxygenase, an enzyme involved in fla- 
vonoids metabolism and in biosynthesis of 
phenylpropanoids, which are secondary metabolites of 
plants and algae. In addition, lpo2614 is predicted to 
encode a kynurenine-oxoglutarate transaminase, an 
enzyme that is part of the tryptophan metabolism and 
lpo2960 codes for a putative glycolate oxidase that cata- 
lyses the conversion of glycolate and oxygen to glyoxy- 
late and hydrogen proxide. Ipo2502 codes a homologue 
of CsbD, a general stress response protein of Bacillus 
subtilis [63]. However, the best BLASTp hit is with the 
Protochlamydia amoebophila homologue, an Acantha- 
moeba sp. symbiont [64]. Probably this gene has been 
acquired by HGT between these two bacteria within 
their amoeba host. Quite surprisingly, we identified a 
gene coding a putative methyl-accepting chemotaxis 
sensory transducer {lpvl770) although all L. pneumo- 
phila strains analyzed to date do not encode chemotaxis 
systems. This gene shares 71.34% amino acid identity 
with Llo3301 of L. longbeachae a protein that is part of 
its chemotaxis system [28] also present in L. drancourtii 
[65]. Probably a common ancestor encoded a chemo- 
taxis system that was lost in L. pneumophila through a 
deletion and degradation process. 
d) Shared genome of the epidemic strains Paris and 
Lorraine 

A search for genes shared by the two endemic strains 
but absent in all other strains identified only three genes 
that fulfilled these criteria and for which a function 
could be predicted. These encode the alpha, beta and 
gamma subunits of a putative thiocyanate hydrolase 
{lpol236, lpol237, lpol238 and lppl219, lppl220, 
lppl221). Most interestingly, these strains are both com- 
mon in France and strain Paris is also world-wide dis- 
tributed [10] suggesting a better niche adaptation. 
Indeed, thiocyanate compounds are used for cleaning 
water circuits and these strains are thus probably able to 
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better resist these treatments [66]. Furthermore, strain 
Alcoy that is responsible for several outbreaks and many 
cases of Legionnaires' disease in Spain, also contains 
these genes [61]. The genes coding the putative thiocya- 
nate hydrolase have a GC content of 41-43%, which is 
significantly higher than the average G+C content of the 
L. pneumophila genome, which is 38%. When searching 
for the closest homologues according to BLAST 
searches we identified them in the genomes of Rhodo- 
coccus opacus strain B4 and Nocardia farcinica spp. 
These two are high G+C Gram-positive bacteria belong- 
ing to the Actinomycetales, which are phylogenetically 
not closely related to Legionella suggesting that L. pneu- 
mophila acquired these genes by horizontal gene 
transfer. 

Taken together, the analysis of the accessory gene 
content showed again that L. pneumophila genomes 
show high plasticity due to mobile genetic elements and 



HGT. No specific virulence related genes explaining 
their different disease frequencies have been identified. 
However, the identification of a specific thiocyanate 
hydrolase might explain the wide distribution of strains 
Paris and Lorraine as it may allow them to better 
adapted to artificial water systems. 

Evolutionary genomics 

Phylogenetic reconstruction reveals extensive recombination 

To analyze the relationship among the six different L. 
pneumophila strains a phylogenetic reconstruction was 
done based on a multilocus sequence (MLSA) approach 
using 31 genes selected according to Zeigler [67] (Table 
7 and Additional file 4, Table S4). These 31 genes were 
chosen as they had been shown to be powerful for pre- 
dicting the relatedness of bacterial genomes [67]. The 
phylogeny obtained from their concatenated alignment 
showed a well-resolved topology with bootstrap values 



Table 7 Characteristics of the 31 genes used for phylogenetic reconstruction 



Gene Name 


Product 


Label^ 


Function 


Length (nts)= 


uvrB 


Excinuclease ABC, subunit B 


Ipp0086 


DNA replication, recombination, and repair 


1992 


pgk 


Phosphoglycerate kinase 


IppOlSl 


Glycolysis/gluconeogenesis 


1191 


rpoA 


RNA polymerase, alplia subunit 


Ipp04i9 


Transcription 


993 


ffh 


Signal recognition particle protein, GTPase 


Ipp0467 


Transport and binding proteins 


1377 


serS 


Seryl tRNA synthetase 


Ipp0575 


tRNA aminoacylation 


1281 


proS 


Prolyl-tRNA synthase 


Ipp0749 


tRNA aminoacylation 


1710 


glyA 


Serine hydroxymethyltransferase 


Ipp0791 


Glycine/serine hydroxymethyltransferase 


1254 


dnaB 


Replicative DNA helicase 


Ipp0803 


DNA replication, recombination, and repair 


1383 


gpi 


Glucose-6-phosphate isomerase 


ipp0825 


Glycolysis/gluconeogenesis 


1500 


iig 


DNA ligase 


Ippl020 


DNA replication, recombination, and repair 


2022 


cysS 


Cysteinyl-tRNA synthetase 


Ippl271 


tRNA aminoacylation 


1371 


trpS 


Tryptophanyl tRNA synthetase 


Ippl399 


tRNA aminoacylation 


1215 


aspS 


Aspart/l-tRNA synthetase 


Ippl434 


tRNA aminoacylation 


1782 


ruvB 


Holliday junction DNA helicase 


Ippl534 


tRNA aminoacylation 


1011 


nrdA 


Ribonucleoside-diphosphate reductase, alpha subunit 


Ippl738 


Deoxyribonucleotide/ribonucleoside metabolism 


2829 


recA 


Bacterial DNA recombination protein 


Ippl765 


DNA replication, recombination, and repair 


1047 


tig 


Trigger factor 


Ippl830 


Protein folding and stabilization 


1332 


lepA 


GTP-binding membrane protein 


Ippl837 


Translation 


1833 


metK 


S-adenosylmethionine synthetase 


Ipp2004 


tRNA aminoacylation 


1149 


dnaJ 


Heat shock protein 


Ipp2006 


Protein folding and stabilization 


1140 


argS 


Arginyl tRNA synthetase 


ipp20i3 


tRNA aminoacylation 


1770 


eno 


Enolase 


ipp2020 


Glycolysis/gluconeogenesis 


1269 


ftsZ 


Cell division protein 


Ipp2662 


Cell division 


1197 


uvrC 


Excinuclease ABC, subunit C 


Ipp2698 


DNA replication, recombination, and repair 


1857 


dnaX 


DNA polymerase III, subunits gamma and tau 


Ipp2802 


DNA replication, recombination, and repair 


1671 


recN 


DNA repair protein 


Ipp2877 


DNA replication, recombination, and repair 


1668 


metG 


Methionyl tRNA synthetase 


Ipp2941 


tRNA aminoacylation 


2013 


f/lO 


Transcription terminator factor 


Ipp3002 


Translation 


1262 


atpD 


ATP synthase El, subunit beta 


Ipp3053 


ATP-proton motive force interconversion 


1377 


atpA 


ATP synthase, subunit alpha 


ipp3055 


ATP-proton motive force interconversion 


1554 


tlidF 


GTP binding protein, thiophene oxidation 


ipp3073 


tRNA and rRNA base modification 


1341 



^ with respect to strain Paris, nts nucleotides 
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over 50%. To ascertain the reliability of the obtained 
phylogenetic tree we established individual phylogenies 
for each of the 31 genes. Surprisingly, the incongruence 
among several gene trees was high. In addition the Con- 
sense program results did not support any node to at 
least 50%. To further investigate these results we under- 
took a second analysis using a Shimodaira-Hasegawa 
test and compared the topologies of the individual align- 
ments of each gene and the concatenated alignment of 
the 31 genes. As shown in Additional file 5, Table S5 
the likelihood-based SH test for alternative tree topolo- 
gies identified striking discordances. A possible explana- 
tion for the identified incongruences among the 
phylogenies obtained in our study is the presence of 
recombination events. 

With the aim to explore whether recombination events 
are present in the selected genes we undertook an in 
depth analysis using the program RDP [68]. Indeed, the 
analysis of individual genes identified intragenic recombi- 
nation in 9 of the 31 genes (Table 8). Numerous addi- 
tional recombination events were detected with the 
concatenated alignment of the 22 genes for which no 
intragenic recombination had been shown (Table 8). To 
minimize false positive recombination events only those 
that were supported by at least two of the six methods 
used in RDP were taken into account. However, except 
one, all were supported by at least three methods. No 
artifacts resulting of positive selection should be included 
in this analysis since all of the genes are either informa- 
tional or operational (housekeeping). Most interestingly, 
four of the genes in which intragenic recombination was 
detected are housekeeping genes (pgk, atpD, ffh, metK). 
Housekeeping genes allow to estimate the extent of 
recombination within bacterial species since presence of 
recombination in such "normally recombination free 
genes" is indicative of a high rate of recombination [22]. 
Similarly antigen-coding genes of Legionella were 
reported to show recombination events [18,69] and cer- 
tain other genomic regions [17,19,70-72]. Another exam- 
ple of intragenic recombination in L. pneumophila is the 
rtxA gene that contains a long tandem repeated domain 
of variable copy number and sequence [4,10,73]. rtxA of 
strain Lorraine and Corby share the same repeats, 
whereas the other strains have unique types of repeats. 
However, when including the newly sequenced strains 
Lorraine and HL 0604 1035 we found that repeats of the 
same type are shared by HL 0604 1035 and Philadelphia 
and by Lorraine and Lens (Figure 4 and Additional file 6, 
Table S6), further substantiating high intragenic recombi- 
nation among strains. 

To reconstruct the phylogenetic history of the species 
L. pneumophila we used thus the concatenated align- 
ment of the 31 genes described above. It gave a topology 
with high bootstrap support, however recombination 



bias may result in high support for the wrong tree. To 
avoid possible bias we thus analyzed the concatenated 
alignment of the 31 genes using a split tree decomposi- 
tion that allows a more realistic representation of the 
phylogenetic relationships. Furthermore we constructed 
a classical bifurcating tree using the highest possible 
number of genes [all orthologs among the six strains 
with (1867 genes) and without (2434 genes) L. longbea- 
chae as outgroup]. As shown in Figure 5 the Splits 
Decomposition phylogeny is network-like suggesting 
incompatible partitions within sequence data, which 
commonly arise from recombination. Although the phy- 
logeny based on the orthologous genes can also be 
affected by recombination, the high number of informa- 
tive sites included in this data set, should allow recover- 
ing the correct history of the species as it has been 
shown previously for other closely related bacterial spe- 
cies [74]. 

Taken together, in contrast to previous studies, which 
reported that the species L. pneumophila is a clonal 
population [13,14] our results show clearly that a high 
recombination rate shapes the L. pneumophila genomes. 
This finding is in line with the natural competence of L. 
pneumophila. However, some worldwide distributed L. 
pneumophila clones have been described {e.g. [10]), sug- 
gesting that L. pneumophila is able to develop a unique 
genetic population structure within a particular region 
or environment as reported recently [72]. 
Recombination of large chromosomal regions of over 200 
kbs among L. pneumophila strains 

Our recombination analysis revealed not only intragenic 
recombination events but also intergenic recombination 
as recombination was detected when using the entire 
alignment even with only recombination free genes 
(Table 8). This finding may be explained by the recom- 
bination of fragments encompassing several genes or 
multiple recombination events involving smaller tracts 
along the genome. To test this hypothesis we used a 
method recently developed for the analysis of Strepto- 
coccus agalactiae genomes [75]. In order to identify pat- 
terns of recombination, nucleotide substitutions between 
strains were counted in sliding windows across the pre- 
viously defined core chromosome representing 15 possi- 
ble pair wise comparisons. Each pair wise comparison 
revealed highly conserved regions (<0.05% polymorph- 
ism on average) and less-conserved regions (>0.7% poly- 
morphism), suggesting the occurrence of 
recombinational exchanges. When analyzing the differ- 
ent strains in depth we identified in each genome sev- 
eral regions with very low polymorphisms (below 0.05%) 
suggesting that DNA exchange of these fragments has 
occurred between the different L. pneumophila strains. 
Most interestingly, the two French strains Paris and HL 
0604 1035 that are present since several years in France 
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Table 8 Intragenic and intergenic recombination in six L. pneumophila genomes predicted on individual genes and on 
combined data using six different methods 

Detection Method 



Data set 


Event Number 


Putative recombinant sequences 


RDP 


GENECONV 


Boot 


Max 
chi 


Chimaera 


SiSscan 


metG 




Lorraine, Lens 


NS 


NS 


NS 


Yes 


Yes 


Yes 


HnnY 




Phi adelphia 


INj 


In 3 


I eb 


I es 


I es 


1 es 






Lcl lb, LUI 1 dl 1 Ic 


IN J 


NS 


NS 


Yes 


Yes 


Yes 


pro5 




HI nfin4ioi5 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 






PhiL^Hf^inni^ 
rill iduci iJi iia 


NS 


Yes 


NS 


Yes 


Yes 


Yes 


cysj 




r 1 1 lldUfc:! pi lid 


l\ J 


NS 


NS 


Yes 


Yes 


NS 


I'm 




LUI I dl 1 It: 


l\ J 


Yes 


Yes 


NS 


NS 


NS 


L/vrC 




L6n5,Phil3ddphi3, Lorrsinp 


IN J 




IN3 


I es 


T es 


r es 


III I 




Lens 


IN J 




f es 


r es 


r es 


r es 






rdllb, riLUOU^IUjJ 


IN J 




IN3 


r es 


r es 


T es 


pgk 






In J 


NS 


NS 


Yes 


Yes 


Yes 


otpD 




Corby 


IN J 


IM J 




r es 


I es 


I es 


Concatenated 




Philadelphia 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




2 


Philadelphia 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




3 


HL05041035 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




4 


HL06041035 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




5 


Philadelphia, Corby, Lorraine 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




5 


Lens 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




7 


Pans, HL06041035 


Yes 


NS 


NS 


Yes 


Yes 


NS 




8 


Paris 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




9 


Lens 


Yes 


Yes 


MC 


Yes 


Yes 


Yes 




10 


Lens 


Yes 


Yes 


Yes 


Yes 


Yes 


NS 




1 1 


HL05041035 


Yes 


Yes 


NS 


Yes 


NS 


NS 




12 


Pans, nLUD04luJD 


Yes 


Yes 


Yes 


Yes 


Yes 


Yes 




13 


LI 1 Ai;A/iir\or i ^-^^ 

nLUoU41 (JjD, Lens 


NS 


Yes 


NS 


Yes 


Yes 


Yes 




14 


Lens, Lorraine 


Yes 


NS 


NS 


Yes 


Yes 


NS 




15 


Pans, HL05041035 


Yes 


Yes 


NS 


Yes 


Yes 


NS 




16 


Corby 


Yes 


NS 


NS 


Yes 


Yes 


NS 




17 


Lens 


NS 


Yes 


NS 


Yes 


Yes 


NS 




18 


HL06041035, Paris 


Yes 


NS 


NS 


Yes 


Yes 


Yes 




19 


Corby 


Yes 


NS 


NS 


Yes 


Yes 


NS 




20 


Lorraine 


Yes 


Yes 


NS 


NS 


NS 


Yes 




21 


Lens 


Yes 


NS 


Yes 


NS 


NS 


Yes 




22 


Corby 


Yes 


NS 


Yes 


Yes 


NS 


Yes 




23 


Lens 


NS 


Yes 


NS 


Yes 


NS 


NS 




24 


Lens 


NS 


Yes 


NS 


Yes 


NS 


Yes 




25 


Philadelphia 


Yes 


NS 


NS 


Yes 


Yes 


Yes 



NS = non significant result. Yes = significant result with p-value <0.05 (where P is the highest acceptable probability value of recombination occurrence). 



show 15 regions of a size between 10 and 99 l<bs that 
have very low polymorphism and thus seem to have 
been exchanged between them (Additional file 7, Figure 
SI). In contrast when comparing strain Lens with the 
other 5 genomes analyzed here, very few regions with 
low polymorphism, two with strain HL 0604 1035 and 
one with strain Lorraine, were detected. Furthermore, 
no DNA exchanges seem to have occurred with strains 
Corby, Philadelphia or Paris. This indicates that strains 



that are frequent in the same environment (e.g. strain 
Paris and HL 0604 1035) show high rates of DNA 
exchange probably by conjugation as suggested for 
Streptococcus agalactiae [75] and Enterococcus fecalis 
[76]. In contrast strain Lens, which has been identified 
to date only twice, in Lens (France) and in Germany, 
very few DNA transfers with the studied L. pneumophila 
strains seem to have taken place. Furthermore, some 
regions may be transferred also between several strains. 
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0 1562 



Lens 



arpB (19511-20162) 

III I ttmmiitr^ -t 



15150/58 



19260 



23769 



26 repeats (522 pb) 



0 1568 



Paris 



Corby 



m 



TTTTTT 



9 repeats (459 pb) 

arpB (18746-19397) 
18858 A 23037 



0 1630 



30 repeats (549 pb) 3 repeats (27 pb) 

15074 18867 0 327 980 



AAIOO 



arpB 



HL06041035 I I I I I I I I I I 

8 repeats (537 pb) 

0 1561 4464 1 



Alcoy I lllllllllllllin 



16 repeats (540 pb) 



4575 



tti 1 1 1 1 t 1 

25 repeats (540 pb) 

0 1573 5874 9684 0 1630 10214 14010 

ttl I I I I I I i t t Alcoy m 



Philadelphia 



4428 0 

A A 

—I Lorraine L. 



8492 orpB ps? (8796-9300) 



7115/27 ▲ 



rn 



12951 



6 repeats (537 pb) 167 



11 repeats (504 pb) 3 repeats (459 pb) 



Figure 4 Schematic representation of the repeat regions present in the rtxA gene of L pneumophila. Colored squares represent repeated 
sequences where the same color corresponds to the same type of repeat. Discontinues lines indicate that the exact number of repeats has not 
been defined. 



Figure 6 shows the distribution of single-nucleotide 
polymorphisms (SNPs) along 330 kb of the genome of 
L. pneumophila HL 0604 1035, Philadelphia and Lor- 
raine as compared to the same region in the genome of 
strain Paris. We identified a region of 213 l<;bs a SNP 
frequency of 0.005%. Except an indel of 158 bs that 
shows higher polymorphism, only 11 SNPs are present 
in this region. This fragment may have evolved by con- 
jugative transfer and recombination between strains Phi- 
ladelphia and Paris. Among others, this region carries 
the genes necessary for lipopolysaccaride biosynthesis, 
that are also part of the smaller fragment that has been 
exchanged with strain HL 0604 1035. Our analyses sug- 
gest, that in addition to frequent intragenic recombina- 
tion also recombination and horizontal transfer of large 
chromosomal fragments is taking place and shapes the 
chromosomes of L. pneumophila. 

Conclusion 

Analysis of the genome sequences of six L. pneumophila 
strains shows that the genomes of this environmental 
pathogen evolve by frequent HGT and high 



recombination rates. Most interestingly, these events 
take place between eukaryotes and prokaryotes and 
among different strains and species of Legionella. A gen- 
ome-wide map analysis of nucleotide polymorphisms 
among these six strains demonstrated that each chromo- 
some is a mosaic of large chromosomal fragments from 
different origins suggesting that exchanges of large DNA 
regions of over 200 kb have contributed to the genome 
dynamics in the natural population. The many T4SS 
might be implicated in exchange of these fragments by 
conjugal transfer. Plasmids also play a role in genome 
diversification and are exchanged among strains and cir- 
culate even between different species of Legionella. 
Importantly, plasmids seem to excise and integrate into 
the genome probably depending on environmental cues. 
However, L. pneumophila encodes also several toxin 
anti-toxin that might help to stabilize certain mobile 
genetic elements. In the near future, the analyses of 100 
s of genomes thanks to new generation sequencing 
combined with molecular studies should provide further 
clues about the genetic mechanisms and the evolution- 
ary forces that shape the Legionella genomes. 
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Lens 



Figure 5 Phylogenetic relationships of the 6 L pneumophila strains analyzed a) Neighbor-net constructed from a concatenation of 31 
genes from 6 L. pneumophila strains under a GTR model, with associated bootstrap values, b) Likelihood tree topology of L. pneumophila strains 
and the outgroup L longbeachae based on orthologous genes present in all strains/species concatenated. 




Paris 



Figure 6 Distribution of single-nucleotide polymorphisms (SNPs) along 330 kb of the genomes of L. pneumophila HL 0604 1035, 
Philadelphia and Lorraine. The number of SNPs (y axis) is plotted according to the position of the corresponding 500 bp fragment on the 
strain Paris chromosome (x axis). A straight blue line indicates 0 polymorphism between the two strains. Numbers on the scale bar indicate the 
percentage of polymorphism. The green (+ strand) and red (- strand) lines depict the corresponding genes. 
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Methods 

Bacterial strains and sequence accession numbers 

The strains sequenced in this study are L. pneumophila 
strain Lorraine [EMBL: FQ958210, EMBL:FQ958212] 
and L. pneumophila HL 0604 1035 [EMBL:FQ958211]. 
Strain Lorraine was isolated in 2004 from a patient and 
was recently described as a newly emerging endemic 
clone [26]. L. pneumophila strain HL 0604 1035 (ST 
734, Bellingham subgroup of the Dresden panel) was 
isolated in 2006 from a water supply system in a French 
hospital that it is colonizing since more than 10 years. 

Sequencing and assembly 

The complete genome sequence of L. pneumophila 
subsp. pneumophila strain HL06041035 (A) and strain 
Lorraine (B) were determined using a Sanger/pyrose- 
quencing hybrid approach. A shotgun library was con- 
structed with lOkb size fragments, obtained after 
mechanical shearing of the total genomic DNA, and 
cloned into vector pCNS (pSU derived). Sequencing with 
vector-based primers was carried out using the ABI 3730 
Applera Sequencer. A total of 20736 (A) and 21888 (B) 
reads (~4 fold-coverage) were analyzed and assembled 
with 502731 (A) and 555541 (B) reads (-15 fold-cover- 
age) obtained with Genome Sequencer GS20 (Roche 
Applied Science). For the assembly, we used the Arachne 
"HybridAssemble" version (Broad Institute, http://www. 
broad.mit.edu) that combines the contigs obtained with 
454 sequencing with Sanger reads. To validate the assem- 
bly, the Mekano interface (Genoscope), based on visuali- 
zation of clone links inside and between contigs, was 
used to check the clone coverage and misassemblies. In 
addition, the consensus was confirmed using Consed 
functionalities http://www.phrap.org: the consensus qual- 
ity and the high quality discrepancies. The finishing step 
was achieved by PGR, primer walking and in vitro trans- 
position technology (Template Generation System™ II 
Kit; Finnzyme, Espoo, Finland), and a total of 930 (A) 
and 999 (B) sequences (109, 165 and 656 respectively for 
L. pneumophila subsp. pneumophila strain HL06041035 
and 62, 204 and 733 respectively for L pneumophila 
subsp. pneumophila str. Lorraine) were needed for gap 
closure and quality assessment. 

Sequence analysis and annotation 

The two newly sequenced L. pneumophila genomes 
were integrated into the MicroScope platform [77] to 
perform automatic and expert annotation of the genes, 
and comparative analysis with the other L. pneumophila 
strains already published. In addition the annotations of 
the previously published genomes were updated. The 
system integrates, for each predicted gene, the results of 
multiple bioinformatics methods (Blast result on 



UniProt and specialized genomic data, InterPro, COG, 
PRIAM, synteny group computation using the complete 
bacterial genomes available at NCBI RefSeq, etc; more 
information on the syntaxic and functional annotation 
process is given in [78]). In addition, many genomic and 
metabolic comparative tools are also available [77]. For 
details see https://www.genoscope.cns.fr/agc/micro- 
scope/home/index.php. 

Definition of orthologous genes 

To define orthologous chromosomal genes among the 
different L. pneumophila strains, pseudogenes and 
mobile elements were not taken into account due to the 
difficulty of ortholog assignment for these genes. Puta- 
tive orthologous relations were defined as gene couples 
fulfilling two criteria: (i) having a bidirectional best hit 
(BBH) with an alignment threshold of 55% identity over 
at least 60% of the query sequence and target size (ii) 
and being in synteny. Subsequently, putative genes with- 
out any orthologous relation due to reduced identity 
percentage were integrated in a pre-existing orthologue 
group if they were flanked by orthologous genes show- 
ing gene order conservation (microsynteny). A final step 
of manual curation was carried out for each doubtful 
case. 

Sequence alignments 

For each gene of the selected data set, the nucleotide 
sequence was aligned based on the amino acid sequence 
using tranalign/EMBOSS package http://emboss.source- 
forge.net/. Subsequently genes were concatenated in dif- 
ferent data sets. 

Identification of eukaryotic like proteins and eukaryotic 
domain carrying proteins 

Eukaryotic domains were identified by analyzing the 
results obtained for all genes using the Interpro database 
that is integrated in MAGE. For the identification of 
eukaryotic like proteins we developed a new method. 
First we constructed two databases, one containing all 
and only eukaryotic sequences retrieved from public 
databases and a second one containing all and only pro- 
karyotic sequences. From the second database we 
excluded the proteins of bacterial genera for which 
eukaryotic like protein-domains have been found in 
high proportions (e.g. parasites of protozoa) or bacterial 
genera that are reported to establish a symbiotic rela- 
tionship with amoeba (for a detailed list see Additional 
file 8, Table S7). Those proteins, that showed a better, 
normalized blast score against eukaryotic proteins than 
to those present in the prokaryotic database were 
retrieved as eukaryotic like proteins. Parameters estab- 
lished for blast were: minimum identity: 25%; minimum 
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ratio avec query: 60%; minimum ratio avec target: 50%. 
The final results were manually checked. 

Phylogenetic Analysis 

For phylogentic reconstruction of the L. pneumophila 
strains analyzed in this work several data sets were used: 
(i) 31 housekeeping genes described to be essential for 
all prokaryotes were selected based on the study of Zeig- 
ler [67] (Table 7 and Additional file 9, Figure S2) for a 
multi locus sequence analysis (MLSA) approach for 
which gene each was analyzed individually and as a con- 
catenated alignment, (ii) a concatenated alignment of 
2434 orthologous genes present in all analyzed L. pneu- 
mophila strains (iii) a concatenated alignment of 1867 
orthologous genes present in all analyzed L. pneumo- 
phila strains and in the selected out group, Legionella 
longbeachae strain NSW150. An analysis of genetic 
divergence was performed using DNAsp vs 5.00.07 [79] 
using the 31 selected housekeeping genes. For phyloge- 
netic reconstruction maximum likelihood (ML) methods 
were used to infer phylogenetic relationships for all data 
sets. Prior to ML analyses, a DNA substitution model 
for each gene or data set was selected using Modeltest 
v3.06 [80] and the Akaike information criterion. ML 
heuristic searches were performed using 500 random 
taxon-addition replicates with tree bisection and recon- 
nection (TBR) and branch swapping. ML bootstrap sup- 
port was determined using 1000 bootstrap replicates. 
The ML best trees were rooted on L. longbeachae when 
added. A network reconstruction was done for the same 
data set (i) using SplitsTree4 (version 4.10) [81]. The 
NeighborNet method and the GTR distance model were 
used to create the network. 

Congruence test 

The 31 genes selected for a MLST approach were tested 
for the significance of topological differences in the 
obtained phylogenetic trees using several methods. The 
first approach was based on the consensus of individual 
gene trees. The consensus tree was inferred using the 
CONSENSE program in the PHYLIP package http://evo- 
lution.genetics.washington.edu/phylip.html applying the 
extended majority rule. Secondly we tested the signifi- 
cance of topological differences in phylogenetic trees 
using the Shimodaira-Hasegawa (SH) test. The SH test 
compares the likelihood score (-InL) of a given data set 
across its ML tree versus the -InL of that data set across 
alternative topologies, which in this case are the ML 
phylogenies for other data sets. The differences in the 
-InL values are evaluated for statistical significance using 
1000 replicates based on resampling estimated with the 
log-likelihood (REEL) method (PAUP version 4.0bl0; 
http://paup.csit.fsu.edu/. We applied the test using all 



the trees obtained with individual genes, with the conca- 
tenated alignment against the alignment of each indivi- 
dual gene and with the alignment of all the 31 genes 
concatenated. 

Recombination analysis 

The 31 genes selected for a MLST approach and its cor- 
responding concatenated alignment, were screened for 
the presence of putative recombination events by using 
RDP 2.0b08 [82]. This program identifies recombinant 
sequences and recombination breakpoints applying sev- 
eral methods. We selected six of them; two phylogenetic 
methods (which infer recombination when different 
parts of the genome result in discordant topologies): 
RDP [68], 2000) and Bootscanning [83]; and four 
nucleotide substitution methods (which examine the 
sequences either for a significant clustering of substitu- 
tions or for a fit to an expected statistical distribution): 
Maxchi and Chimaera [84], GeneConv [85] and Sis-scan 
[86]. We considered only those recombination events in 
our analysis that were identified by at least two meth- 
ods. The common settings for all methods were (i) to 
consider sequences as circular, (ii) a statistical signifi- 
cance of P < 0.05, and (iii) a Bonferroni correction for 
multiple comparisons implemented in RDP. 

Additional material 



Additional file 1: Table SI: Nucleotide identity of 140 selected Dot/ 
Icm substrates of strain Philadelphia and of their orthologs in the L 
pneumophila strains analyzed in this study 

Additional file 2: Table S2: Genes specific of strain HL 0604 1035 
with respect to strains Paris, Lens, Philadelphia, Corby and Lorraine 

Additional file 3: Table S3: Genes specific of strain Lorraine with 
respect to strains Paris, Lens, Philadelphia, Corby and HL0604 1035. 

Additional file 4: Table S4: Summary of genetic diversity 
parameters for the 31 selected L. pneumophila genes used to 
establish the phylogeny 

Additional file 5: Table S5: Results for the SH Test of alternative 
topologies for the 6 analyzed L pneumophila strains 

Additional file 6: Table S6: Conserved domains and repeats of the 
rtxA gene in 8 L pneumophila strains 

Additional file 7: Figure SI - Distribution of single-nucleotide 
polymorphisms (SNPs) along the genome of L pneumophila HL 
0604 1035 as compared to strains Lens, Philadelphia, Corby and 
Lorraine. The number of SNPs (y axis) is plotted according to the 
position of the corresponding 500 bp fragment on the strain Paris 
chromosome (x axis). A straight blue line indicates 0 polymorphism 
between the two strains. Numbers on the scale bar indicate the 
percentage of polymorphism. Yellow blocks indicate chromosoma 
regions with a SNP number lower than 0,005%. 

Additional file 8: Tables S7 - List of bacterial genera removed from 
our prokaryotic database 

Additional file 9: Figure S2: Distribution of the 31 genes selected 
for establishing the phylogeny of L. pneumophila species The 

coordinates are given with respect to the chromosome of L. 
pneumophila strain Paris. Numbers next to gene names indicate the first 



Gomez-Valero et al. BMC Genomics 201 1, 12:536 
http://www.biomedcentral.eom/1 471 -2 1 64/1 2/536 



Page 22 of 24 



position of the corresponding gene starting from tlie origin of 
replication. 
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